-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add comments into the AST #14352
base: main
Are you sure you want to change the base?
Add comments into the AST #14352
Conversation
3eb21c3
to
5dcb6e7
Compare
lib/elixir/lib/code.ex
Outdated
include_comments = Keyword.get(opts, :include_comments, false) | ||
|
||
Process.put(:code_formatter_comments, []) | ||
opts = [preserve_comments: &preserve_comments/5] ++ opts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I told you privately that we could make preserve_comments: :ast
an option but I now realized that preserve_comments
is actually a private option, so the :include_comments
option is better.
dc05897
to
518a43e
Compare
518a43e
to
a1580da
Compare
a1580da
to
b12a9a3
Compare
There's a few issues I still need to tackle and a few more cleanups are pending(mostly around the code for formatting interpolations and the repetition of comment docs generation in general), but I think this is generally ready for an initial round of reviews There are 3 tests currently failing that are related to binary operators: 1) test case with when and clause comment (Code.Formatter.IntegrationTest)
test/elixir/code_formatter/integration_test.exs:298
Assertion with == failed
code: assert IO.iodata_to_binary(Code.format_string!(good, opts)) == String.trim(good)
left: "case decomposition do\n <<h, _::binary>> when h != ?< ->\n decomposition =\n decomposition\n |> :binary.split(\" \", [:global])\n |> Enum.map(&String.to_integer(&1, 16))\n\n Map.put(dacc, String.to_integer(codepoint, 16), decomposition)\n\n _ ->\n dacc\nend"
right: "case decomposition do\n # Decomposition\n <<h, _::binary>> when h != ?< ->\n decomposition =\n decomposition\n |> :binary.split(\" \", [:global])\n |> Enum.map(&String.to_integer(&1, 16))\n\n Map.put(dacc, String.to_integer(codepoint, 16), decomposition)\n\n _ ->\n dacc\nend"
stacktrace:
test/elixir/code_formatter/integration_test.exs:299: (test)
2) test comment inside operator with when (Code.Formatter.IntegrationTest)
test/elixir/code_formatter/integration_test.exs:590
Assertion with == failed
code: assert IO.iodata_to_binary(Code.format_string!(bad, opts)) == result
left: "raise # Comment\n function(x) ::\n any"
right: "# Comment\nraise function(x) ::\n any"
stacktrace:
test/elixir/code_formatter/integration_test.exs:597: (test)
3) test first argument in a call without parens with comments (Code.Formatter.IntegrationTest)
test/elixir/code_formatter/integration_test.exs:468
Assertion with == failed
code: assert IO.iodata_to_binary(Code.format_string!(good, opts)) == String.trim(good)
left: "with bar ::\n :ok\n | :invalid\n | :other"
right: "with bar ::\n :ok\n | :invalid\n # | :unknown\n | :other"
stacktrace:
test/elixir/code_formatter/integration_test.exs:469: (test) Once those are fixed, running |
This implements AST comments and updates the formatter to look for comments in the AST instead of a separate comments list.
This adds a new
:include_comments
toCode.string_to_quoted/2
andCode.string_to_quoted!/2
.Comments are attached to AST nodes in three variants:
:leading_comments
: Comments that are located before a node, or in the same line.Examples:
:trailing_comments
: Comments that are located after a node, and before the end of the parent enclosing the node(or the root document).Examples:
:inner_comments
: Comments that are located inside an empty node.Examples:
A comment may be considered inner or trailing depending on wether the enclosing node is empty or not. For example, in the following code:
The comment is considered inner because the
do
block is empty. However, in the following code:The comment is considered trailing to
bar
because thedo
block is not empty.In the case no nodes are present in the AST but there are comments, they are inserted into the root
:__block__
node as:inner_comments
.Many of the ideas here come from Sourceror, which takes inspiration for the
recast
parser. Inrecast
, comments are attached to nodes with theleading
andtrailing
boolean flags. If a comment has both attributes set tofalse
, it represents an "inner" comment. For the sake of clarity I decided to define it explicitly for the Elixir AST.An additional difference is that
recast
stores all the comments(leading, trailing, inner) mixed together under a singlecomments
field. I find that separating them to be cleaner and easier to work with, since you don't have to split or filter the list to get a certain kind of comment from a node.Different from Sourceror which supports only
:leading_comments
and:trailing_comments
, it introduces:inner_comments
to remove the ambiguity between comments before theend
keyword, and comments after an expression. This also becomes important when working on the formatter code, as it is very awkward to inject inner comments after the call args were already converted to algebra docs. The Sourceror implementation relies on the formatter behavior to produce ast and comments that the formatter would interpret correctly, meaning it can get away with a sub-optimal comment merging strategy; it only needs the formatter to work. This PR on the contrary tries to attach comments where they make the most sense, and updates the formatter accordingly instead of wrangling with comment line numbers.The current implementation in this PR adds an extra traversal step after the code is initially parsed. Due to this, and due to the logic to merge comments and AST requiring precise line information in every AST nodes, it requires to also set up the following options:
TODO